Generating image features based on robust feature-learning

ABSTRACT

Techniques for increasing robustness of a convolutional neural network based on training that uses multiple datasets and multiple tasks are described. For example, a computer system trains the convolutional neural network across multiple datasets and multiple tasks. The convolutional neural network is configured for learning features from images and accordingly generating feature vectors. By using multiple datasets and multiple tasks, the robustness of the convolutional neural network is increased. A feature vector of an image is used to apply an image-related operation to the image. For example, the image is classified, indexed, or objects in the image are tagged based on the feature vector. Because the robustness is increased, the accuracy of the generating feature vectors is also increased. Hence, the overall quality of an image service is enhanced, where the image service relies on the image-related operation.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/166,164, filed May 26, 2016 and entitled “GENERATING IMAGE FEATURESBASED ON ROBUST FEATURE-LEARNING.” The disclosure of which isincorporated herein by reference in its entirety.

TECHNICAL FIELD

This disclosure relates generally to techniques for image processingand, more specifically, to generating features from images based onrobust feature-learning that spans multiple training datasets andmultiple tasks. The features are usable for, for example, imageclassification, tagging, and indexing.

BACKGROUND

For image-related services, client-server architectures are popularlyused. For example, an image service can is hosted on a server to providea client (e.g., a mobile device) access to a collection of imagesmaintained by the server. The client can search, download, and useimages available from the server.

Generally, existing image services extract features from images, wherethese features support various operations related to the images, some ofwhich are made accessible to clients. Many of the existing imageservices implement a feature-learning framework for learning thefeatures. Feature-learning involves using a feature-embedding functionto generate compact and representative feature vectors. A feature vectorincludes values representing different visual content from an image. Forexample, in an image of a tree, a feature vector include numericalvalues representing a trunk, branch, or leaf of the tree. The featurevector is subsequently used for operations such as, for example, toclassify the image. Hence, the image is classified in a “tree” category.

A commonly used feature-learning framework implements a convolutionalneural network. The convolutional neural network includes multiplelayers for processing images and, accordingly, generating featurevectors. The accuracy of learning features depends on how well theconvolutional neural network is trained. Generally, training is drivenby a training dataset and a training task. The training dataset islabeled. The training task is defined by the labels and by a type ofcost function (e.g., a regression loss, a classification loss, or otherloss-based functions).

Current solutions for training a convolutional network for a specifictask involve a single training dataset having labels specific to thattraining dataset. Many reasons exist for this training approach. In oneexample, layers of the convolutional neural network are set to learnfeatures according to the training task. In another example, thecomputational burden associated with the training may require thisapproach. Generally, the more advanced the training (e.g., the largerthe training dataset), the higher the computational burden becomes.

Once a convolutional neural network is trained, the trained network canbe used to learn features from an image. The feature learning can behighly accurate if the image falls within the boundary of the training(e.g., the labels used for the training can be properly applied to theimage). Otherwise, the robustness of the convolutional neural networksuffers. Specifically, in many real-world scenarios, a single trainingdataset and training task can be insufficient for learning robust andgeneralizable feature representation due to, for example, label noise,imbalance of label distribution, and shift in data distribution withinthe training dataset. The impact to the robustness and, thus, to theaccuracy of learning feature can reduce the quality of service providedto clients.

To illustrate this impact to the robustness and accuracy, consider anexample of image classification based on gender. In this illustrativeexample, known gender images are used to train a convolutional neuralnetwork for gender classification. Thus, this network can accuratelygenerate features for classifying people based on gender. With respectto gender-based classification, an image service that relies on theconvolutional neural network to classify images can perform well.However, using the convolutional neural network with features for otherclassifications (e.g., for age-based classification) may provide lessaccurate results. The quality of service can similarly degrade for theother classifications.

SUMMARY

One exemplary embodiment involves increasing robustness of generatingfeatures representing content of images based on a multiple dataset andtask training. In particular, multiple training datasets and multipletraining tasks are used to train a convolutional neural network. As usedherein, a “training domain” represents a training dataset and one ormore associated training tasks. Generally, the convolutional neuralnetwork is trained across different training domains. For example onetraining domain relates to gender-based images (e.g., training imageslabeled for gender classification), while another training domainrelates to age-based images (e.g., training images labeled for ageclassification).

In contrast to existing systems that use a single training domain (e.g.,a single dataset training or a single task training), the multipledomain training results in more generalizable feature representations.That is because the feature representations are learned across all ofthe different training domains. For instance, the convolutional neuralnetwork is trained to generate feature representation applicable togender and age, individually and in combination. Hence, training enablesthe convolutional neural network to learn each feature individually, incombination with another feature, and to transfer knowledge about thisfeature to the other feature (e.g., how gender can impact an ageprediction). In addition, the training is more robust to, for example,label noise, imbalance of label distribution, and shift in datadistribution. In comparison, an existing system that uses a singletraining domain learns feature representation specific to the singledomain and is insufficient for learning robust and generalizable featurerepresentation outside of that domain.

Hence, a convolutional neural network trained by using multiple domainsis more robust. When a new image is processed through this trainedconvolutional neural network, more accurate and relevant features aregenerated for that image. The quality of image services that rely onsuch features to perform image-related operations is thereby enhanced.

In an example that illustrates the improvement to an image service, auser operates a client to request (e.g., search or browse) an image froma server. To respond, the server selects the image from a collection ofimages. For instance, if the client searches for an image that containsa particular gender and age, the server responds with such an image.

In this example, the server executes an image service that implements aconvolutional neural network. The image service tags, indexes, and/orperforms other image-related operations on the images to enable theabove client-server interaction. Such operations are performed on theimages based on features learned from the images. The convolutionalneural network is trained across multiple training domains to learn thefeatures. For instance, one training domain A contains known images ofpeople, labeled with gender of the people, for gender classification. Asecond training dataset B contains known images of people, labeled withage of the people, for age prediction. Yet, another training dataset Ccontains known images of natural scenery, labeled with what eachdepicted scenery includes, for identification of scenery type. Because amultiple domain training is used, more generalizable featurerepresentations are learned. These feature representations apply to age,gender, scenery, individually and in combination. Thus, when used for animage operation (e.g., image classification, tagging, and/or indexing),the feature representations enable more accurate results of theperforming the image-related operations.

Next, the trained convolutional neural network processes a new image ofa young man wrapping his arm around his girlfriend, where both aresitting around a campfire on a beach in the early evening to learn thefeatures from the image. The learned features allow the image service toclassify the new image as relating to youth, male and female genders,and natural scenery and to tag the image as containing a young couple, acampfire, a beach, and an evening skyline. When the client searches fora campfire beach image, the server can respond with the new image in thesearch results.

These illustrative features are mentioned not to limit or define thedisclosure, but to provide examples to aid understanding thereof. Theseand additional features may be implemented independently in variousembodiments or may be combined in yet other embodiments, further detailsof which can be seen with reference to the following description andillustrations. Advantages offered by one or more of the variousembodiments may be further understood by examining the specification orby practicing one or more of the various embodiments.

BRIEF DESCRIPTION OF THE FIGURES

These and other features, embodiments, and advantages of the presentdisclosure are better understood when the following Detailed Descriptionis read with reference to the accompanying drawings.

FIG. 1 illustrates an example of a client-server architecture forproviding an image service, according to embodiments of the presentdisclosure.

FIG. 2 illustrates an example of an implementation for generatingfeatures of an image and processing the image based on the features,according to embodiments of the present disclosure.

FIG. 3 illustrates another example of an implementation for generatingfeatures of an image and processing the image based on the features,according to embodiments of the present disclosure.

FIG. 4 illustrates an example of a convolutional neural network that istrained across multiple training datasets and multiple training tasks,according to embodiments of the present disclosure.

FIG. 5 illustrates another example of a convolutional neural networkthat is trained across multiple training datasets and multiple trainingtasks, according to embodiments of the present disclosure.

FIG. 6 illustrates an example of a flow for training a featuregenerator, generating features from image data of an image, andperforming an image-related operation based on the features, accordingto embodiments of the present disclosure.

FIG. 7 illustrates an example of a flow for training a feature generatorbased on batches of training data, according to embodiments of thepresent disclosure.

FIG. 8 illustrates another example flow for training a feature generatorbased on batches of training data, according to embodiments of thepresent disclosure.

FIG. 9 illustrates another example flow for training a feature generatorbased on batches of training data, according to embodiments of thepresent disclosure.

FIG. 10 illustrates an example of a computing environment suitable foruse in embodiments of the present disclosure.

DETAILED DESCRIPTION

Generally, the embodiments described herein involve increasingrobustness of generating image features. More specifically, therobustness is increased by training a convolutional neural networkacross multiple training domains, where the training domains aredifferent. Each training domain includes a training dataset and one ormore associated training tasks. The convolutional neural network istrained by inputting data from each training dataset and updating theparameters of the convolution neural network according to the respectivetraining tasks. Once updated, the parameters enable the generation offeature representations from images.

In an example, features of an image are generated based on analyzing theimage, where the analysis involves the convolutional neural network. Thefeatures can be organized in a feature representation, such as a featurevector of the image. The convolutional neural network is trained tolearn the features based on multiple training datasets and multipletraining tasks. Each training dataset has a set of labels. The labelsneed not be the same across the training datasets. A training task isdefined by a set of data labels and a loss function. The trainingupdates parameters based on minimizing the loss functions. Unlikeexisting systems, by using multiple training datasets and multipletraining tasks, the multiple domain training of the present disclosureresults in more generalizable feature representations. In other words,whereas an existing system may be good at training a convolutionalneural network to learn a feature (e.g., gender), the multiple domaintraining enables the convolutional neural network to learn that featureindividually, in combination with another feature (e.g., gender andage), and to transfer knowledge about this feature to the other feature(e.g., how gender can impact an age prediction).

As used herein, a “training dataset” represents a set of known data(images with known labels) usable for training. As also used herein, a“training task” represents a task for training the convolutional neuralnetwork given a training dataset. Generally, a training task isassociated with a loss function. Training involves minimizing the lossfunctions such that the convolutional neural network learns the best ormost optimized representations of known data that satisfy the trainingtask (e.g., what features from the images represent a label at theminimal cost).

In an example, each training dataset is mapped to an input layer of theconvolutional neural network. Each training task is mapped to anotherlayer of the convolutional neural network. For clarity of explanation,such a layer is referred to herein as an output layer. A loss functionis associated with a training task mapped to the output layer. Hiddenlayers of the convolutional neural network are set-up between the inputlayers and the output layers and are trained to learn featurerepresentations. The training includes inputting training data from thetraining datasets at the input layers, processing the training datathrough the various layers, updating the weights of the output layerssuch that the loss functions are minimized, and back propagating theweights to the hidden layers. The back-propagation uses, for example, astochastic gradient descend (SGD) algorithm to cumulatively update theparameters of the hidden layers. The parameters include, for instance,weights, biases, and/or other hidden layer-related parameters. Hence,the hidden layers are trained (e.g., the parameters of this layer areupdated) across the different training datasets and training tasks.

Once trained, image data of a new image is input to the convolutionalneural network. Accordingly, a feature vector of the image (or someother feature representation) is an output from the convolutional neuralnetwork.

The multiple dataset and task training differs from existing trainingsof a convolutional neural network in many aspects. These aspectsinclude, for example, data augmentation, multiple task training, andmodel averaging. In particular, the disclosed training allows dataaugmentation from multiple training domains. In particular, the traininginvolves larger amount of training datasets, each of which can have afocus on particular features. In addition, the training datasets neednot have the same labels. Thus, there is no need to re-label diversetraining datasets. Instead, the training datasets can be used ascurrently labeled. In addition, because a label generally relates to atraining task, training tasks can applied across multiple trainingdatasets. The training across these datasets and tasks also represent,in a way, an averaging of individual trainings (e.g., individualtraining for each dataset and task). This averaging is achieved withoutthe computational burden of performing the individual trainingsseparately and subsequently combining them in a collective training.

In comparison to the existing trainings, using multiple dataset andtraining tasks provides various technical advantages, as furtherdescribed in the present disclosure. For example, augmenting thetraining data is more flexible because different training datasets canbe used regardless of the data labels and/or the training tasks. Inanother example, the feature-learning results in more generalizablefeature representations and is more robust to, for example, label noise,imbalance of label distribution, and shift in data distribution. In yetanother example, the computational burden is reduced relative totraining individual models, where each individual model is specific to atraining dataset and training task.

These technical advantages also positively affect the operations (e.g.,classification, tagging, indexing, retrieval) of an image-relatedservice. In particular, the operations become more accurate and can usea larger and more diverse collection of images. These improvementstranslate to an overall better quality of service.

In an illustration, the image-related service maintains a largecollection of images, in the thousands and even millions, for a largenumber of users. New images become available on a daily basis and areprocessed and added to the already large collection. Thousands if notmillions of users also operate clients to access some or all of theimages on a daily basis. Properly classifying, tagging, and indexing theimages (existing or new) becomes crucial for the user access. Absent thedisclosed training, the analysis of this great amount of data maysacrifice accuracy to alleviate the computational burden. In comparison,the disclosed training achieves more accuracy while also beingcomputationally efficient.

Specific details of various exemplary embodiments of the presentinvention are set forth in the following description and are illustratedin FIGS. 1-10. The various embodiments described herein can beimplemented in hardware, software, or a combination thereof. In thefigures, similar symbols may identify similar components, unless contextdictates otherwise. Certain well-known technology details, such asmethods, apparatus, or systems that would be known by one of ordinaryskill, are not set forth in the following description or in the figuresto avoid unnecessarily obscuring the various examples. Those of ordinaryskill in the relevant art will understand that they can practice otherexamples of the disclosed subject matter without departing from thescope and spirit of the present invention.

Turning to FIG. 1, the figure illustrates a client-server architecturefor providing an image service 110. A server 102 hosts the image service110. A client 130 communicates with the server 102 over a data networkto access functionalities of the image service 110. The data networkincludes a public data network, such as the Internet, and/or a privatedata network, such as an intranet. Example functionalities provided tothe client 130 include uploading, searching, downloading, recognizingobjects, and/or other functionalities related to images. Hence, theclient 130 can send an image request 132 to the server 102. The server102 responds with an image 120 or an identifier (e.g., a uniformresource locator (URL)) of a storage space storing the image 120.

In an example, the server 102 represents a suitable computing system forhosting the image service 110. The computing system can be implementedon physical resources, such as collection of servers or a server farm orwithin a cloud-based computing service. The client 130 represents asuitable end user device such as a mobile device, a desktop, or othercomputing devices capable of communication with the server 102 over adata network.

In an example, the image service 110 represents an image serviceapplication. For instance, the image service 110 implements a number ofmodules to maintain a collection of images and enable the applicationsof operations on such images. Each module can be implemented inhardware, software hosted on hardware, or a combination of hardware andsoftware. A feature generator and image processor 112 is an example ofsuch modules.

Generally, the feature generator and image processor 112 is configuredto generate features from an image and to process the image based on thefeatures. In an example, the configuration implements a convolutionalneural network. As further illustrated in FIG. 2, the convolutionalneural network learns the features (e.g., generates a feature vector).For instance, a feature layer of the convolutional neural network istrained to learn the features. In another illustration, as furtherdescribed in connection with FIG. 3, the convolutional neural networkalso initiates the processing of the image. For instance, an outputlayer of the convolutional neural network is mapped to tasks. Particulartasks are selected and initiated for processing the image.

As illustrated in FIG. 1, the feature generator and image processor 112is trained using multiple image training datasets 114. Each trainingdataset includes training data labeled for a particular training task.In an example, the training data can include image data corresponding totraining images. The images need not be of the same resolution, size,and/or have a same file extension type. For instance, while one trainingdataset corresponds to low resolution JPEG images, another trainingdataset corresponds to high resolution BMP images. In addition, thetraining data need not be limited to image data. Instead, the trainingdata can correspond to other data modalities. For instance, while onetraining dataset includes image data, another training dataset includesnon-image data that corresponds to at least one of a text file or anaudio file.

Once trained, the feature generator and image processor 112 analyzes anew image 140 (or any number of images). Such images can be publishedfrom various online sources and/or generated locally to the imageservice 110, periodically or at time intervals.

The analysis includes generating the features of the new image 140. Forinstance, image data of the new image 140 is input to the convolutionalneural network. An output of the convolutional neural network is thelearned features. The analysis also includes applying one or more-imagerelated operations 116 to the new image 140 based on the features. Theimage-related operations 116 include, for instance, classifying,tagging, or indexing the new image 114 according to the features. Theprocessed image is added to a collection of processed images 118. Theprocessed images 118 represent stock images available to the client 130.For instance, the processed images 118 are classified, tagged, and/orindexed, thereby enabling the client access to various functionalitiesof the image service 110.

To illustrate, consider an example of three training datasets 114.Training dataset A contains known images of people for genderclassification. Training dataset B contains known images of people forage prediction. Training dataset C contains known images of naturalscenery for identification of scenery type. The feature generator andimage processor 112 is accordingly trained. In this example, the newimage 140 is for a young man wrapping his arm around his girlfriend,where both are sitting around a campfire on a beach in the earlyevening. Thus, the new image 140 is added as a stock image, where thestock image is classified as relating to youth, male and female genders,and natural scenery. The stock image is also tagged as containing animage of a young couple, a campfire, a beach, and an evening skyline.When a request of a client for a campfire beach image is received, thestock image can be included in the search results.

FIG. 2 illustrates an example of an implementation for generatingfeatures of an image and processing the image based on the features. Inthis example implementation, the feature generation and the imageprocessing are distributed across multiple modules. In particular, afeature generator 210 is configured to generate image features. An imageprocessor 220 is configured to apply one or more imagerelated-operations according the image features to generate processedimages.

In an example, the feature generator 210 implements a convolutionalneural network 212. The convolutional neural network 212 is trainedbased on multiple training datasets 214 and respective training tasks.Accordingly, parameters of the convolutional neural network 212 areupdated. Further, the convolutional neural network 212 includes afeature layer. The feature layer outputs image features usable to theimage processor 220. Hence, image data of a new image 216 are input tothe convolutional neural network 212. Image features 218 of the image216 are output from the feature layer.

In an example, the image processor 220 maintains a feature space 222.The feature space 222 maps image features to known images, where theseknown images may have already been processed (e.g., classified, tagged,and/or indexed). For the new image 216, the image processor 220 searchesthe feature space for a match between the images features 218 and themapped features. For instance, the matching uses a nearest neighboralgorithm. The nearest neighbor image(s) is selected from the featurespace. Knowledge about the selected image(s) (e.g., labels, tags, and/orindexes) is transferred to the new image 216. For instance, theknowledge transfer includes updating the mapping of the feature space222 and/or adding the knowledge to metadata of the new image 216.Accordingly, the image processor 220 stores the new image 216 with theknowledge as a processed image 224.

FIG. 3 illustrates another example of an implementation for generatingfeatures of an image and processing the image based on the features. Inthis example implementation, a feature generator 310 is configured togenerate the image features and to initiate the application of one ormore image-related operations on the image.

Similarly to the feature generator 210 described in connection with FIG.2, the feature generator 310 implements a convolutional neural network312. The convolutional neural network 312 is trained based on multipletraining datasets 314 and respective training tasks. In this example,the convolutional neural network 312 also includes an output layer.Neurons (e.g., nodes in a structure representing the network 312) of theoutput layer are mapped to image-related tasks. For instance, one neuronmay be associated with predicting how well a label is applicable to theimage. Another neuron may be associated with predicting how well a tagis applicable to objects within the image. Accordingly, when a new image316 is received, the output layer outputs predictions across differentimage-related tasks. The feature generator 210 selects a set of thetasks based on the respective predictions. For instance, theimage-related task having the highest likelihood or image-relatedtask(s) having likelihood(s) over a likelihood threshold is selected.The feature generator 210 initiates the selected image-related task(s),thereby causing one or more image-related operations to be applied tothe new image 316. For instance, the selected task relates to aprediction that a label has the highest likelihood of being applicableto the image. Accordingly, an image-related operation is applied to addthe label to the image and/or classify the image based on the label.

FIG. 4 illustrates an example of a convolutional neural network 400 thatis trained across multiple training datasets and multiple trainingtasks. As illustrated, the convolutional neural network 400 includes ahidden layer 410. Although a single hidden layer 410 is illustrated, thehidden layer 410 itself can include a hierarchy of a number of hiddenlayers. One end of the hierarchy is connected to an input layer 420 ofthe convolutional neural network 400. Another end of the hierarchy isconnected to an output layer 430 of the convolutional neural network400.

The input layer 420 maps input data of, for example, images to inputneurons of the convolutional neural network 400. As illustrated, duringthe training of the convolutional neural network 400, “K” trainingdatasets are used, where “K” is an integer equal or greater than two.Hence, the training datasets 420(1), 420(2), . . . , 420(K) are mappedto the input layer 420. In an example, each training dataset is mappedto a collection of the input neurons and has a set of labels.

The output layer 420 maps tasks to output neurons of the convolutionalneural network 400. As illustrated, during the training of theconvolutional neural network 400, “K” training tasks are used. Hence,the training tasks 430(1), 430(2), . . . 430(K) are mapped to the outputlayer 430. In an example, each training task is mapped to a collectionof the output neurons. In addition, each training task corresponds to atleast one training dataset that has a set of labels. The training taskis also associated with a training loss function, such as a regressionloss or a classification loss. FIG. 4 represents this association with aloss layer 440. The loss layer 440 includes “K” training loss functions432(1), 432(2), . . . , 432(K). Each training loss function correspondsto a training task. During the training, a training loss function for atraining task imposes a training loss (e.g., a cost or a penalty) for adeviation between a predicted label and a true label given the trainingtask and the respective training dataset and set of labels. The trainingaims at minimizing the training loss. A training domain may refer totraining dataset and training task that are related, where the trainingdataset has a set of labels and where the training task is associatedwith a loss function.

The hidden layer 410 includes a number of layers, such as aconvolutional layer and a fully connected layer (e.g., a high-levelreasoning layer having neurons connected to lower-level layers andcapable of generating feature representations from inputs to suchlower-level layers). Each layer uses a set of parameters, such asweights and biases. The training aims at updating the parameters giventhe minimization of training losses across the loss layer 440. In anexample, the parameters are optimized using a back-propagationalgorithm, such as a stochastic gradient descend (SGD) algorithm.

In an example, the formulation of the training is expressed asfollowing. Annotate the training domains as {(D₁, L₁), (D₂, L₂), . . . ,(D_(K), L_(K))}, where “D_(i)” and “L_(i)” represent the training dataand the training loss function for a training domain. Training for eachtraining domain can be represented as the following optimizationproblem:

$\min\limits_{\theta_{H},\theta_{L_{i}}}{{L_{i}\left( {{D_{i};\theta_{H}},\theta_{L_{i}}} \right)}.}$“θ_(H)” represents the parameters for the hidden layer 410. Theseparameters are used to compute the feature embedding (e.g., to generatea feature representation as an output from a fully connected layer ofthe hidden layer 410). “θ_(L) _(i) ” represents the parameters for theoutput layer, including linear weights applied on the feature space fora specific task. The goal of the feature training is to learn a goodhidden layer configuration “θ_(H).”

For the multiple domain training (e.g., multiple training datasets andmultiple training tasks), the optimization problem can be expressed as:

$\begin{matrix}{{\min\limits_{\theta_{H},\theta_{L_{1}},\ldots\mspace{14mu},\theta_{L_{K}}}{\sum\limits_{i = 1}^{K}{w_{i}{L_{1}\left( {{D_{i};\theta_{H}},\theta_{L_{i}}} \right)}}}},} & (1)\end{matrix}$where “w_(i)” is the weight for a domain in the training process. In anexample, equation (1) is optimized using SGD.

For clarity of explanation, FIG. 4 illustrates “K” training datasets and“K” training tasks. However, a one-to-one relationship need not existbetween training datasets and training tasks. Instead, each trainingdataset can relate to a number of training tasks. Conversely, eachtraining task can relate to a number of training datasets. Toillustrate, consider the example of the following training domain. Asingle training dataset relates to “M” training tasks, where “M” isequal or greater than two. In this example, the training dataset ismapped to the input layer 420. Likewise, each of the “M” training tasksis mapped to the output layer 430 and is associated with an individualloss function. The training loss function for the training domain isupdated to include the sum of the individual loss functions.

FIG. 5 illustrates another example of a convolutional neural network 500that is trained across multiple training datasets and multiple trainingtasks. The convolutional neural network 500 includes similar layers asthose of the convolutional neural network 400 of FIG. 4, such as ahidden layer 510, an input layer 520, an output layer 530, and a losslayer 540. Similarities are not repeated herein in the interest ofclarity. In addition to these layers, the convolutional neural network500 includes a transformation layer 522.

In particular, and as previously described, the training datasets520(1), 520(2), . . . , 520(K) need not use the same format for thetraining data. For instance, the training dataset 520(1) can includeimage data for training images of a certain size, resolution, or fileformat. In comparison, the training dataset 520(2) can include imagedata for training images of a different size, resolution, or fileformat. On the other hand, the training dataset 520(K) can includenon-image data, such as text data or audio data. In this case, thetransformation layer 522 is configured to generate a featurerepresentation (e.g., a feature vector) from each training dataset,where the feature representation uses a uniform format (e.g., a commonfeature vector format to the different feature vectors). The variousfeature representations are then input to the hidden layer 510. Thus,the transformation layer 522 transforms the training datasets that havedifferent data formats into training feature sets 522(1), 522(2), . . ., 522(K), that have a uniform format.

In an example, the transformation layer 522 implements a featureembedding function to generate (e.g., learn) the training feature sets522(1), 522(2), . . . , 522(K). For instance, during the training of theconvolutional neural network 500, the parameters “θ_(I) _(i) ” of thetransformation layer 522 also updated through a backpropagationalgorithm. In another illustration, the transformation layer 522represents on its own a neural network (e.g., a convolutional neuralnetwork) or a convolutional layer. In this illustration, thetransformation layer 522 is trained to learn the features given theformat type of the training data. In other words, the transformationlayer 522 is trained to generate the features from image data andnon-image data given the specific data formats.

FIGS. 6-9 illustrate example flows for training and using a featuregenerator, such as one that implements a convolutional neural network.For illustrative purposes, a computer system is illustrated asperforming operations of the illustrative flows. The computer systemimplements some or all of the components and modules illustrated inFIGS. 1-5, including the feature generator. Nevertheless, other systems,or specific modules hosted on such systems, can be configured toimplement one or more of the operations and/or one or more steps of theoperations. In addition, while the operations are illustrated in aparticular order, it should be understood that no particular order isnecessary and that one or more operations may be omitted, skipped,and/or reordered. Similarities between the operations across the exampleflows are not repeated.

FIG. 6 illustrates an example of a flow for training the featuregenerator, generating features from image data of an image, andperforming an image-related operation based on the features. The exampleflow starts at operation 602, where the computer system accesses a firsttraining dataset. In an example, the first training dataset is availablefrom local storage of the computer system or from remote storageaccessible over a data network. The first training dataset correspondsto a first training domain and includes image data of training imagesand a set of data labels. The first training domain also includes afirst training task defined based on the data labels and associated witha first training loss function.

At operation 604, the computer system accesses a second trainingdataset. The second training dataset corresponds to a second trainingdomain. Further, the second training dataset includes image or non-imagedata and a set of data labels. The second training domain also includesa second training tasks defined based on the data labels and associatedwith a second training loss function.

For illustrative purposes, two training datasets and two training tasksare illustrated in the example flow of FIG. 6. However, a larger numberof training datasets and/or of training tasks can also be similarly usedfor the training of the feature generator, as described in connectionwith FIGS. 1-5.

At operation 606, the computer system trains the image feature generatorbased on the first training dataset and the second training dataset. Inan example, the training includes updating parameters of the featuregenerator. In particular, the feature generator implements aconvolutional neural network, such as the one described above withrespect to FIG. 4 or FIG. 5. The training includes minimizing thetraining loss functions of the training domains as expressed in equation(1) above. For instance, the sum of the first loss function and thesecond loss function is minimized. The minimization corresponds to a setof optimized parameters of an output layer of the convolutional neuralnetwork. Parameters of the hidden layer(s) of the convolutional neuralnetwork are updated based on a backpropagation algorithm, such as onethat uses SGD.

At operation 608, the computer system accesses an image. In an example,the image is published from a computing source or is generated locallyat the computer system. In another example, the image is received from aclient in communication with the computer system. Regardless of theimage source, the computer system accesses the image in order to performan image-related operation such as to classify, tag, or index the image.

At operation 610, the computer system generates image features for theimage. In an example, image data of the image is input to the featuregenerator. An output of the feature generator includes a feature vector(or, more generally, some other feature representation) of the image. Inparticular, the feature vector includes features learned by theconvolutional neural network. The learning is based on a featureembedding function of the convolutional neural network as defined, inpart, by the parameters of the convolutional neural network through thetraining.

At operation 612, the computer system performs an image-relatedoperation on the image based on the image features. In an example, thefeature vector enables an application of the image-related operation tothe image. For instance, the computer system implements an imageprocessor, as described in connection with FIG. 2. The image processormaintains a feature space of known images and associated knowledge(e.g., labels, tags, and indexes). The computer system matches thefeature vector to at least one known feature vector from the featurespace. The matching uses a nearest neighbor algorithm. The known featurevector corresponds to a known image. The computer system transfers theknowledge about the known image to the image. The knowledge enables theapplication of the image-related operation (e.g., if a label is added tothe image, the image is classified according to the label). In anotherexample, the feature generator initiates the image-related operationwithout using the feature space, as described in connection with FIG. 3.

For instance, a layer of the convolutional neural network, such as afully connected layer, predicts image-related tasks (e.g., how well alabel applies to an image or how well a tag applies to an object withinan image). Accordingly, an image-related task that has an acceptablelikelihood (e.g., one exceeding a threshold) is selected and initiated,leading to the application of a respective image-related operation. Forinstance, if adding a particular label has the highest likelihood, theaddition of the particular label to the image is initiated. The additionenables the classification of the image according to the particularlabel.

At operation 614, the computer system provides access to a processedimage. In an example, the access is provided to a client incommunication with the computer system over a data network. Theprocessed image represents the image after the application of theimage-related operation. For instance, the processed image is a stockimage that has been classified, tagged, or indexed. The client cansearch, browse, or retrieve the stock image.

FIG. 7 illustrates an example of a flow for training the featuregenerator based on batches of training data. In particular, multipletraining datasets are used for the training. Each training dataset isdivided in a number of subsets. During the training, the subsets areinput to the feature generator in sequential batches. The batch trainingcan improve the computational burden associated with training of thefeature generator (e.g., of the convolutional neural network).

The example flow starts at operation 702, where the computer systemaccesses a training subset “i” of a training dataset “j.” For example,“K” training datasets are available, where “K” is an integer equal orgreater than two. Each of the training datasets is divided in “M”training subsets. Hence, the computer system accesses, for instance, thefirst training subset of the first training dataset, then the firsttraining subset of the second training dataset, and so on and so forthuntil all “M” training subsets across all “K” training datasets havebeen accessed as further illustrated in the next operations.

At operation 704, the computer system minimizes the loss function “j.”In particular, at this operation, the computer system inputs theaccessed training subset “i” of the training dataset “j” to theconvolutional network. The loss function “j” corresponds to the trainingdataset “j.” The loss function “j” is minimized by optimizing parametersof the output layer given the training subset “i.” To illustrate, thecomputer system accesses the third training subset of the secondtraining dataset. The second training dataset is associated with asecond loss function. The second training loss is minimized byoptimizing the parameters given the third training sub set.

At operation 706, the computer system updates the parameters of theconvolutional neural network. These parameters are updated based on theminimization of the loss function “j.” For instance, a backpropagationalgorithm, such as one using SGD, is applied for the update. Continuingwith the previous illustration, the convolution neural network istrained based on applying the backpropagation algorithm once the secondloss function is minimized.

At operation 708, the computer system determines whether additionaltraining datasets exist or not. If the training subsets “i” for all “K”training datasets have been analyzed, operation 710 is followed.Otherwise, operation 712 is followed, where the computer system selectsthe next training dataset “j+1.” The computer system then accesses thesubset for the next training subset “j+1,” thereby iterativelyperforming operations 702-708. To illustrate, consider the example ofthree training datasets. In this example, the computer system hasalready batched trained the convolutional network using first trainingsubsets from only the first two training datasets. Accordingly, thecomputer system selects the third training dataset and continues thebatch training using the first training subset of the third trainingdataset.

At operation 710, the computer system determines whether additionaltraining subsets exist or not. If the “M” training subsets have beenused, the training is complete as indicated at operation 714. Otherwise,the computer system determines that other training subset(s) have notbeen used yet. Accordingly, at operation 716, the computer systemselects the next training subset “i+1.” The computer system thenaccesses the next training subset, thereby iteratively performingoperations 702-710. To illustrate and continuing with the previous threetraining datasets example, if the computer system determines that thesecond training subset of the first training dataset has not been usedyet, the computer system accesses this second training subset andaccordingly continues training the convolutional neural network.

In an example, the above training of FIG. 7 uses a frequency for forwardand back propagations. In other words, the training alternatesmini-batches of datasets (e.g., alternates the training subsets “i”across the different training datasets “j”) at a particular frequency.The frequency can be expressed as a number of iterations. For instance,the convolutional neural network is trained with one hundred iterations(or some other number of iterations) by using a training subset of onetraining dataset, then one hundred iterations (or some other number thatneed not be the same as the first number of iterations) with a nexttraining subset of a next training dataset, and so on and so forth untilthe training completes.

FIG. 8 illustrates another example of a flow for training the featuregenerator based on batches of training data, according to embodiments ofthe present disclosure. In this example, training subsets from differenttraining datasets are combined such that the batch training of theconvolutional network across the training datasets is performed in aparallel manner.

The example flow starts at operation 802, where the computer systemaccesses a training subset “i” of a training dataset “j.” At operation804, the computer system determines if additional training datasetsexist. If so, operation 806 is performed, where the computer systemselects the next training dataset “j+1” to access the training subset“i” of the next training dataset “j+1,” thereby iteratively repeatingoperations 802-804. To illustrate, consider the example of threetraining datasets. Each training dataset is divided into two trainingsubsets. The computer system determines that the first training subsetof the first training dataset has been accessed. In this example, thecomputer system also determines that the first training subset of thesecond training dataset has not been accessed yet. Accordingly, thecomputer system accesses the first training subset of the secondtraining. The access is also repeated across the third training dataset.

At operation 808, the computer system combines the accessed trainingsubsets “i” across the “K” training datasets. The computer systemaccordingly generates a training set “i” that combines the trainingsubsets “i.” In the previous example, the computer system combines allthree first training subsets of the three training datasets to form afirst training set. In an example, the combination allows the computersystem to track the source of training data (e.g., which training subsetof which training dataset does the training data belong to). Thetracking is used in the minimization of the loss functions. Forinstance, the combination includes interleaving the training data of thetraining subsets “i.” For instance, the training set “i” is expressed asS_(i): {(d_(i1,1), . . . d_(iK,1)); . . . ; (d_(i1,N), . . . d_(iK,N))},where “d_(ij,l)” represents training data “d_(i)” from training subset“i” of training dataset “j.”

At operation 810, the computer system minimizes loss functions acrossthe “K” training datasets. For example, the computer system inputs thetraining set “i” to the convolutional neural network and minimizes thesum of the loss functions, as expressed in equation (1). In thisexample, the loss function “j” of a training dataset “j” is computedbased on the training data corresponding to the training dataset “j.” Inother words, by tracking the source of the training data in the trainingset “i,” as explained at operation 808, the computer system can properlycompute the training loss functions.

At operation 812, the computer system updates the parameters of theconvolutional neural network. These parameters are updated based on theminimization of the loss functions. For instance, a backpropagationalgorithm, such as one using SGD, is applied for the update.

At operation 814, the computer system determines whether additionaltraining subsets exist or not. If the “M” training subsets have beenused, the training is complete as indicated at operation 816. Otherwise,the computer system determines that other training subset(s) have notbeen used yet. Accordingly, at operation 818, the computer systemselects the next training subset “i+1.” The computer system thenaccesses the next training subset, thereby iteratively performingoperations 802-814. To illustrate and continuing with the previous threetraining datasets example, if the computer system determines that thesecond training subsets have not been used yet, the computer systemaccesses the second training subsets across the three training datasetsto form a second training set and accordingly continue training theconvolutional neural network.

In an example, the above training of FIG. 8 need to use the same size oftraining subsets to create a training set. Instead, training samples canbe combined from different datasets with a proportion to balance thetraining according to a desired application. In other words, wherein afirst training subset from a first training dataset is combined with asecond training subset from a second training dataset, the first andsecond training subsets can have different sizes. For instance, thefirst training subset includes one hundred training samples (or someother number), while the second training subset includes fifty trainingsamples (or some other number) corresponding to a 2:1 proportion. Theproportion can be decided by an operator setting up the training of theconvolutional network. The operator can consider multiple factorsincluding, for example, whether to bias or balance the training towardsa particular training dataset or training task associated with thetraining dataset. For instance, if the first training dataset is morerelevant to a particular application, the proportion can increase thenumber of training samples used from the first training dataset relativeto the second one.

FIG. 9 illustrates another example of a flow for training the featuregenerator based on a sequential training approach. In particular,multiple training datasets are used for the training. Instead ofdividing each training dataset into batches as illustrated in FIGS. 7and 8, the training datasets are sequentially input to the featuregenerator. In other words, a first training dataset is first input totrain the feature generator. Once that training is complete, the nexttraining dataset is input for additional training and so and so forthuntil the feature generator is trained across all of the desiredtraining datasets.

The example flow starts at operation 902, where the computer systemaccesses a training dataset “j.” For example, “K” training datasets areavailable, where “K” is an integer equal or greater than two. Hence, thecomputer system accesses, for instance, the first training dataset, thenthe next training dataset, and so on and so forth until all “K” trainingdatasets have been accessed as further illustrated in the nextoperations.

At operation 904, the computer system minimizes the loss function “j.”In particular, at this operation, the computer system inputs theaccessed training dataset “j” to the convolutional network. The lossfunction “j” corresponds to the training dataset “j.” The loss function“j” is minimized by optimizing parameters of the output layer given thetraining first loss function. The first training loss is minimized byoptimizing the parameters given the training samples from the firsttraining dataset.

At operation 906, the computer system updates the parameters of theconvolutional neural network. These parameters are updated based on theminimization of the loss function “j.” For instance, a backpropagationalgorithm, such as one using SGD, is applied for the update. Continuingwith the previous illustration, the convolution neural network istrained based on applying the backpropagation algorithm once the firstloss function is minimized. In an example, the parameters of the sharedconvolutional neural network are fixed once the updated is complete. Ata next iteration of the training, a next training dataset is used tofine tune the output layer (e.g., to minimize the loss functionassociated with the next training dataset).

At operation 908, the computer system determines whether additionaltraining datasets exist or not. If so, operation 910 is followed, wherethe computer system selects the next training dataset “j+1.” Thecomputer system then accesses the next training dataset “j+1,” therebyiteratively performing operations 902-908. Continuing with the previousexample, the computer system has already trained the convolutionalnetwork using first training dataset. Accordingly, the computer systemselects the second training dataset and continues the training. On theother hand, if no remaining datasets exist, operation 912 is followed,where the computer system completes the training.

Turning to FIG. 10, the figure illustrates examples of components forimplementing some or all of the components of a computer system 1000.The computer system 1000 is an example of any of the computer systemsdescribed in connection with FIGS. 1-8. In an example, the computersystem 1000 implements the image service 110 of FIG. 1. Although thesecomponents are illustrated as belonging to a same computing system 1000,the computing system 1000 can be distributed.

The computing system 1000 includes at least a processor 1002, a memory1004, a storage device 1006, input/output peripherals 1008,communication peripherals 1010, and an interface bus 1012. The interfacebus 1012 is configured to communicate, transmit, and transfer data,controls, and commands among the various components of the computingsystem 1000. The memory 1004 and the storage device 1006 includecomputer readable storage media, such as RAM, ROM, electrically erasableprogrammable read-only memory (EEPROM), hard-drives, CD-ROMs, opticalstorage devices, magnetic storage devices, electronic non-volatilecomputer storage, for example Flash® memory, and other tangible storagemedia. Any of such computer readable storage media can be configured tostore instructions or program codes embodying aspects of the disclosure.The memory 1004 and the storage device 1006 also include computerreadable signal media. A computer readable signal medium includes apropagated data signal with computer readable program code embodiedtherein. Such a propagated signal takes any of a variety of formsincluding, but not limited to, electromagnetic, optical, or anycombination thereof. A computer readable signal medium includes anycomputer readable medium that is not a computer readable storage mediumand that can communicate, propagate, or transport a program for use inconnection with the computing system 1000.

Further, the memory 1004 includes an operating system, programs, andapplications. The processor 1002 is configured to execute the storedinstructions and includes, for example, a logical processing unit, amicroprocessor, a digital signal processor, and other processors. Thememory 1004 and/or the processor 1002 can be virtualized and can behosted within another computing system of, for example, a cloud networkor a datacenter. The input and output peripherals 1008 include userinterfaces such as a keyboard, screen, microphone, speaker, otherinput/output devices, and computing components such as graphicalprocessing units, serial ports, parallel ports, universal serial bus,and other input/output peripherals. The input/output peripherals 1008are connected to the processor 1002 through any of the ports coupled tothe interface bus 1012. The communication peripherals 1010 areconfigured to facilitate communication between the computing system 1000and other computing devices over a communications network and include,for example, a network interface controller, modem, wireless and wiredinterface cards, antenna, and other communication peripherals.

While the present subject matter has been described in detail withrespect to specific embodiments thereof, it will be appreciated thatthose skilled in the art, upon attaining an understanding of theforegoing may readily produce alterations to, variations of, andequivalents to such embodiments. Accordingly, it should be understoodthat the present disclosure has been presented for purposes of examplerather than limitation, and does not preclude inclusion of suchmodifications, variations, and/or additions to the present subjectmatter as would be readily apparent to one of ordinary skill in the art.Indeed, the methods and systems described herein may be embodied in avariety of other forms; furthermore, various omissions, substitutionsand changes in the form of the methods and systems described herein maybe made without departing from the spirit of the present disclosure. Theaccompanying claims and their equivalents are intended to cover suchforms or modifications as would fall within the scope and spirit of thepresent disclosure.

Unless specifically stated otherwise, it is appreciated that throughoutthis specification discussions utilizing terms such as “processing,”“computing,” “calculating,” “determining,” and “identifying” or the likerefer to actions or processes of a computing device, such as one or morecomputers or a similar electronic computing device or devices, thatmanipulate or transform data represented as physical electronic ormagnetic quantities within memories, registers, or other informationstorage devices, transmission devices, or display devices of thecomputing platform.

The system or systems discussed herein are not limited to any particularhardware architecture or configuration. A computing device can includeany suitable arrangement of components that provide a result conditionedon one or more inputs. Suitable computing devices include multipurposemicroprocessor-based computer systems accessing stored software thatprograms or configures the computing system from a general-purposecomputing apparatus to a specialized computing apparatus implementingone or more embodiments of the present subject matter. Any suitableprogramming, scripting, or other type of language or combinations oflanguages may be used to implement the teachings contained herein insoftware to be used in programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in theoperation of such computing devices. The order of the blocks presentedin the examples above can be varied—for example, blocks can bere-ordered, combined, and/or broken into sub-blocks. Certain blocks orprocesses can be performed in parallel.

Conditional language used herein, such as, among others, “can,” “could,”“might,” “may,” “e.g.,” and the like, unless specifically statedotherwise, or otherwise understood within the context as used, isgenerally intended to convey that certain examples include, while otherexamples do not include, certain features, elements, and/or steps. Thus,such conditional language is not generally intended to imply thatfeatures, elements and/or steps are in any way required for one or moreexamples or that one or more examples necessarily include logic fordeciding, with or without author input or prompting, whether thesefeatures, elements and/or steps are included or are to be performed inany particular example.

The terms “comprising,” “including,” “having,” and the like aresynonymous and are used inclusively, in an open-ended fashion, and donot exclude additional elements, features, acts, operations, and soforth. Also, the term “or” is used in its inclusive sense (and not inits exclusive sense) so that when used, for example, to connect a listof elements, the term “or” means one, some, or all of the elements inthe list. The use of “adapted to” or “configured to” herein is meant asopen and inclusive language that does not foreclose devices adapted toor configured to perform additional tasks or steps. Additionally, theuse of “based on” is meant to be open and inclusive, in that a process,step, calculation, or other action “based on” one or more recitedconditions or values may, in practice, be based on additional conditionsor values beyond those recited. Similarly, the use of “based at least inpart on” is meant to be open and inclusive, in that a process, step,calculation, or other action “based at least in part on” one or morerecited conditions or values may, in practice, be based on additionalconditions or values beyond those recited. Headings, lists, andnumbering included herein are for ease of explanation only and are notmeant to be limiting.

The various features and processes described above may be usedindependently of one another, or may be combined in various ways. Allpossible combinations and sub-combinations are intended to fall withinthe scope of the present disclosure. In addition, certain method orprocess blocks may be omitted in some implementations. The methods andprocesses described herein are also not limited to any particularsequence, and the blocks or states relating thereto can be performed inother sequences that are appropriate. For example, described blocks orstates may be performed in an order other than that specificallydisclosed, or multiple blocks or states may be combined in a singleblock or state. The example blocks or states may be performed in serial,in parallel, or in some other manner. Blocks or states may be added toor removed from the disclosed examples. Similarly, the example systemsand components described herein may be configured differently thandescribed. For example, elements may be added to, removed from, orrearranged compared to the disclosed examples.

What is claimed is:
 1. A computer-implemented method associated withusing a convolutional neural network, the method comprising: accessing,by a computer system, a first training dataset comprising first imagedata, the first training dataset associated with a first task and afirst label applicable to the first image data; accessing, by thecomputer system, a second training dataset comprising second trainingdata, the second training dataset associated with a second task and asecond label; and training, by the computer system, the convolutionalneural network by at least: minimizing a first loss function for thefirst training dataset based on the first task and a second lossfunction for the second training dataset based on the second task, andupdating parameters of the convolutional neural network based on theminimizing of the first loss function and the second loss function,wherein a feature vector is generated in response to inputting imagedata of an image to the convolutional neural network upon completion ofthe training, and wherein the feature vector enables an application ofan image-related operation to the image, the image-related operationcomprising at least one of: an image classification, an image tagging,or an image retrieval.
 2. The computer-implemented method of claim 1,further comprising: mapping the first training dataset to an input layerof the convolutional neural network; mapping the first task to an outputlayer of the convolutional neural network; configuring a fully connectedlayer of the convolutional neural network, the fully connected layerconnected with the output layer; and generating, by the fully connectedlayer, features from the first image data.
 3. The computer-implementedmethod of claim 2, wherein the training further comprises: computingweights of the output layer based on the minimizing of the first lossfunction; and back-propagating the weights to the fully connected layer.4. The computer-implemented method of claim 1, further comprisingmapping the first task and the second task to respective output layersof the convolutional neural network, and wherein training theconvolutional neural network comprises iteratively: generating atraining subset by combining data from the first training dataset andthe second training dataset; inputting the training subset to theconvolutional neural network; updating weights of the output layers byminimizing the first loss function and the second loss function based onthe training subset; and updating the parameters of the convolutionalneural network based on the weights of the output layers.
 5. Thecomputer-implemented method of claim 1, further comprising mapping thefirst task and the second task to respective output layers of theconvolutional neural network, and wherein training the convolutionalneural network comprises iteratively: inputting first data of the firsttraining dataset to the convolutional neural network; updating firstweights of a first output layer by minimizing the first loss functionbased on the first data; updating the parameters of the convolutionalneural network based on the first weights of the first output layer;inputting second data of the second training dataset to theconvolutional neural network; updating second weights of a second outputlayer by minimizing the second loss function based on the second data;and updating the parameters of the convolutional neural network based onthe second weights of the second output layer.
 6. Thecomputer-implemented method of claim 5, wherein updating the firstweights comprises iteratively training the convolutional neural networkfor a predetermined number of iterations prior to inputting the seconddata of the second training dataset.
 7. The computer-implemented methodof claim 1, further comprising: mapping the first task and the secondtask to respective output layers of the convolutional neural network,and wherein training the convolutional neural network comprises:inputting the first training dataset to the convolutional neuralnetwork; updating shared layers of the convolutional neural network byminimizing the first loss function based on the first training dataset;inputting the second training dataset to the convolutional neuralnetwork; and tuning an output layer corresponding to the second trainingdataset by minimizing the second loss function based on the secondtraining dataset.
 8. The computer-implemented method of claim 1, whereinthe second training dataset comprises second image data, wherein atleast of: the second task is different from the first task or the secondlabel is different from the first label.
 9. The computer-implementedmethod of claim 1, wherein the first training dataset and the secondtraining dataset are associated with different data formats, and furthercomprising: mapping the first training dataset and the second trainingdataset to respective input layers of the convolutional neural network;training the input layers based on the minimizing of the first lossfunction and the second loss function; and generating, by the inputlayers, input feature vectors having a common vector format.
 10. Asystem comprising: a processor; a memory communicatively coupled to theprocessor and bearing instructions that, upon execution by theprocessor, cause the system to at least: input image data of an imageassociated with an image-related application to a convolutional neuralnetwork, wherein: the convolutional neural network is trained by atleast updating parameters of the convolutional neural network based onminimizing a plurality of loss functions, and each of the plurality ofloss functions is associated with a different training dataset and isminimized for the training dataset based on a task and a label specificto the training dataset, each training dataset comprising image data;generate a feature vector for the image based on the parameters of theconvolutional neural network; and provide the feature vector to theimage-related application, wherein the feature vector enables theimage-related application to perform an image-related operation on theimage, and wherein the image-related operation comprises at least oneof: an image classification, an image tagging, or an image retrieval.11. The system of claim 10, wherein a first label of a first trainingdataset associated with a first loss function is different from a secondlabel of a second dataset associated with a second loss function. 12.The system of claim 10, wherein a first task of a first training datasetassociated with a first loss function is different from a second task ofa second dataset associated with a second loss function.
 13. The systemof claim 10, wherein a first label of a first training datasetassociated with a first loss function is different from a second labelof a second training dataset associated with a second loss function, andwherein a first task of the first training dataset and a second task ofthe second training dataset are of the same task type and each isspecific to the first label or the second label.
 14. The system of claim10, wherein a first training dataset is associated with a plurality ofimage-related tasks, wherein a first loss function associated with thefirst training dataset comprises a loss function corresponding to eachimage-related task of the plurality of image-related tasks, and whereinthe first loss function is minimized by at least minimizing a sum of aplurality of loss functions corresponding to the plurality ofimage-related tasks.
 15. A computer-readable storage medium storinginstructions associated with an image-related application, theinstructions when executed on a computing device configure the computingdevice to perform operations comprising: providing image data of animage associated with the image-related application to a convolutionalneural network, wherein: the convolutional neural network is trained byat least updating parameters of the convolutional neural network basedon minimizing a plurality of loss functions, and each of the pluralityof loss functions is associated with a different training dataset and isminimized for the training dataset based on a task and a label specificto the training dataset, each training dataset comprising image data;receiving, from the convolutional neural network, a feature vector forthe image based on the parameters of the convolutional neural network;and providing an image-related operation based on the feature vector,the image-related operation comprising at least one of: an imageclassification, an image tagging, or an image retrieval.
 16. Thecomputer-readable storage medium of claim 15, wherein the operationsfurther comprise: populating a feature space based on features generatedby the convolutional neural network from training datasets; andselecting one of the image classification, the image tagging, and theimage retrieval based on the nearest neighbor of the feature vector tofeature vectors from the feature space.
 17. The computer-readablestorage medium of claim 15, wherein the operations further comprise:selecting the image-related operation from the image classification, theimage tagging, and the image retrieval based on a layer of theconvolutional neural network, wherein the layer comprises nodescorresponding to the at least one of: an image classification, an imagetagging, or an image retrieval.
 18. The computer-readable storage mediumof claim 17, wherein the image-related operation is selected based oncomputation by the layer of likelihoods corresponding to the imageclassification, the image tagging, and the image retrieval, wherein thecomputation is based on the feature vector.
 19. The computer-readablestorage medium of claim 15, wherein the convolutional neural network isstored as code in the image-related application.